VAST Challenge 2021 Mini-Challenge 2.
GAStech is a Tethys-based company having been operating a natural gas production site in the island country of Kronos for over 20 years. It has produced remarkable profits and developed strong relationships with the government of Kronos, but has not been as successful in demonstrating environmental stewardship.
In January, 2014, the leaders of GAStech are celebrating their new-found fortune as a result of the initial public offering of their very successful company. In the midst of this celebration, several employees of GAStech go missing. An organization known as the Protectors of Kronos (POK) is suspected in the disappearance, but things may not be what they seem.
This case is designed aim at helping the law enforcement from Kronos and Tethys investigate the incident by using data visualization techniques. There are 3 challenges in VAST Challenge 2021 focusing on different aspects of case analysis. In this report we concentrated on visualization and analysis for Mini-Challenge 2.
Many of the Abila, Kronos-based employees of GAStech have company cars which are approved for both personal and business use. Those who do not have company cars have the ability to check out company trucks for business use, but these trucks cannot be used for personal business. The vehicles are installed with GPS tracked periodically as long as they are moving. Besides, in order to promote local businesses, Kronos based companies provide a Kronos Kares benefit card to GASTech employees giving them discounts and rewards in exchange for collecting information about their credit card purchases and preferences as recorded on loyalty cards.
Now the vehicle tracking data for the two weeks prior to the incident, car assignment list, transaction records in credit card and loyal card are available for analyzing.
The challenges to be dealt with are listed below:
| No. | Question |
|---|---|
| 1 | Using just the credit and loyalty card data, identify the most popular locations, and when they are popular. What anomalies do you see? What corrections would you recommend to correct these anomalies? |
| 2 | Add the vehicle data to your analysis of the credit and loyalty card data. How does your assessment of the anomalies in question 1 change based on this new data? What discrepancies between vehicle, credit, and loyalty card data do you find? |
| 3 | Can you infer the owners of each credit card and loyalty card? What is your evidence? Where are there uncertainties in your method? Where are there uncertainties in the data? |
| 4 | Given the data sources provided, identify potential informal or unofficial relationships among GASTech personnel. Provide evidence for these relationships. |
| 5 | Do you see evidence of suspicious activity? Identify 1- 10 locations where you believe the suspicious activity is occurring, and why. |
The detailed information and all the data needed for Mini-challenge 2 is available in VAST Challenge 2021 official website.
The dataset used for Mini-Challenge 2 includes 4 CSV files, a package of ESRI shapefiles of Abila and Kronos, and a tourist map of Abila in JPEG format, as shown in the following screenshot.
Fig.1 Dataset for visualization and analysis
The data contents in the CSV files are listed below:
| File | Description | Data Content |
|---|---|---|
| car-assignments.csv | A list of vehicle assignments by employee | Employee Last Name Employee First Name Car ID Current Employment Type (Department) Current Employment Title (job title) |
| gps.csv | vehicle tracking data | Timestamp Car ID (integer) Latitude Longitude |
| cc_data.csv | credit and debit card transaction data | Timestamp Location (name of the business) Price (real) Last 4 digits of the credit or debit card number |
| loyalty_data.csv | loyalty card transaction data | Timestamp Location (name of the business) Price (real) Loyalty Number (A 5-character code starting with L that is unique for each card) |
We used R studio as the tool to import, process, visualize and analyze the data.
The first thing is run this line of code to clear the environment and remove existing R objects (if any).
The code chunk below is used to install and launch the packages necessary for next steps.
packages = c('ggiraph', 'plotly','DT', 'patchwork',
'raster', 'sf','tmap', 'mapview','gifski',
'tidyverse', 'mlr','lubridate')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
First of all, below code chunk is used to identify the encoding type of the CSV file to make sure no garbled characters in the imported data.
guess_encoding("data/car-assignments.csv")
# A tibble: 1 x 2
encoding confidence
<chr> <dbl>
1 ASCII 1
guess_encoding("data/gps.csv")
# A tibble: 1 x 2
encoding confidence
<chr> <dbl>
1 ASCII 1
guess_encoding("data/cc_data.csv")
# A tibble: 2 x 2
encoding confidence
<chr> <dbl>
1 windows-1252 0.41
2 windows-1254 0.25
guess_encoding("data/loyalty_data.csv")
# A tibble: 2 x 2
encoding confidence
<chr> <dbl>
1 windows-1254 0.26
2 windows-1252 0.24
According to above results, “windows-1254” would be set as the encoding for cc_data.csv and loyalty_data.csv when importing the file, using read_csv() function in tidyverse package.
car_ass <- read_csv("data/car-assignments.csv")
gps <- read_csv("data/gps.csv")
cc <- read_csv("data/cc_data.csv", locale = locale(encoding = "windows-1252"))
loyalty <- read_csv("data/loyalty_data.csv", locale = locale(encoding = "windows-1252"))
As shown below, we need to check if the data type is proper in the imported tibble data table. It’s obvious that the Timestamp in gps.csv, cc_data.csv and loyalty_data.csv should be in datetime format but now it’s in character format. Besides, CarID in car-assignments.csv, id in gps.csv and last4ccnum in cc_data.csv should be converted from numerical data to categorical data.
Rows: 44
Columns: 5
$ LastName <chr> "Calixto", "Azada", "Balas", "Barranc~
$ FirstName <chr> "Nils", "Lars", "Felix", "Ingrid", "I~
$ CarID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12~
$ CurrentEmploymentType <chr> "Information Technology", "Engineerin~
$ CurrentEmploymentTitle <chr> "IT Helpdesk", "Engineer", "Engineer"~
Rows: 685,169
Columns: 4
$ Timestamp <chr> "01/06/2014 06:28:01", "01/06/2014 06:28:01", "01/~
$ id <dbl> 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35~
$ lat <dbl> 36.07623, 36.07622, 36.07621, 36.07622, 36.07621, ~
$ long <dbl> 24.87469, 24.87460, 24.87444, 24.87425, 24.87417, ~
Rows: 1,490
Columns: 4
$ timestamp <chr> "01/06/2014 07:28", "01/06/2014 07:34", "01/06/20~
$ location <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
Rows: 1,392
Columns: 4
$ timestamp <chr> "01/06/2014", "01/06/2014", "01/06/2014", "01/06/~
$ location <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~
To achieve this, mdy_hms() and mdy() functions in lubridate package are used to covert the data type to datetime, and as.character() function is used to convert data as characters.
gps$Timestamp = mdy_hms(gps$Timestamp)
cc$timestamp = mdy_hm(cc$timestamp)
loyalty$timestamp = mdy(loyalty$timestamp)
car_ass$CarID = as.character(car_ass$CarID)
gps$id = as.character(gps$id)
cc$last4ccnum = as.character(cc$last4ccnum)
Since the transaction date in credit and loyalty card data are all in January, the date of month, weekday, hour of time can be derived from timestamp and displayed in different columns of cc_data.csv and loyalty_data.csv. The same as GPS tracking data. As shown in below code chunk, day() function is used to get the date, wday() to get the weekday, hour() to get the hour of time.
cc$day = day(cc$timestamp)
cc$weekday = wday(cc$timestamp, label = T, abbr = T)
cc$hour = hour(cc$timestamp)
loyalty$day = day(loyalty$timestamp)
loyalty$weekday = wday(loyalty$timestamp, label = T, abbr = T)
gps$day = as.factor(day(gps$Timestamp))
gps$weekday = wday(gps$Timestamp, label = T, abbr = T)
gps$hour = as.factor(hour(gps$Timestamp))
Then we do some exploration for the data and check the missing values by using the code chunks below. Only CarID in car-assignments.csv has 9 missing values.
knitr::kable(summarizeColumns(car_ass), caption = "EDA for Car Assigment Data", digits = 2)
| name | type | na | mean | disp | median | mad | min | max | nlevs |
|---|---|---|---|---|---|---|---|---|---|
| LastName | character | 0 | NA | 0.95 | NA | NA | 1 | 2 | 38 |
| FirstName | character | 0 | NA | 0.95 | NA | NA | 1 | 2 | 43 |
| CarID | character | 9 | NA | NA | NA | NA | 1 | 1 | 35 |
| CurrentEmploymentType | character | 0 | NA | 0.70 | NA | NA | 5 | 13 | 5 |
| CurrentEmploymentTitle | character | 0 | NA | 0.80 | NA | NA | 1 | 9 | 21 |
knitr::kable(summarizeColumns(gps), caption = "EDA for GPS Tracking Data", digits = 2)
| name | type | na | mean | disp | median | mad | min | max | nlevs |
|---|---|---|---|---|---|---|---|---|---|
| Timestamp | POSIXct | 0 | NA | NA | NA | NA | 1.00 | 22.00 | 303206 |
| id | character | 0 | NA | 0.96 | NA | NA | 2317.00 | 24713.00 | 40 |
| lat | numeric | 0 | 36.06 | 0.01 | 36.06 | 0.01 | 36.05 | 36.09 | 0 |
| long | numeric | 0 | 24.88 | 0.01 | 24.88 | 0.01 | 24.83 | 24.91 | 0 |
| day | factor | 0 | NA | 0.88 | NA | NA | 12208.00 | 82786.00 | 14 |
| weekday | ordered | 0 | NA | 0.79 | NA | NA | 28829.00 | 142148.00 | 7 |
| hour | factor | 0 | NA | 0.85 | NA | NA | 182.00 | 106146.00 | 21 |
knitr::kable(summarizeColumns(cc), caption = "EDA for Credit Card Transaction Data", digits = 2)
| name | type | na | mean | disp | median | mad | min | max | nlevs |
|---|---|---|---|---|---|---|---|---|---|
| timestamp | POSIXct | 0 | NA | NA | NA | NA | 1.00 | 16 | 1116 |
| location | character | 0 | NA | 0.86 | NA | NA | 1.00 | 212 | 34 |
| price | numeric | 0 | 207.70 | 740.86 | 28.24 | 24.62 | 2.01 | 10000 | 0 |
| last4ccnum | character | 0 | NA | 0.98 | NA | NA | 4.00 | 37 | 55 |
| day | integer | 0 | 11.99 | 3.95 | 12.00 | 5.93 | 6.00 | 19 | 0 |
| weekday | ordered | 0 | NA | 0.82 | NA | NA | 104.00 | 264 | 7 |
| hour | integer | 0 | 13.86 | 4.56 | 13.00 | 7.41 | 3.00 | 22 | 0 |
knitr::kable(summarizeColumns(loyalty), caption = "EDA for Loyalty Card Transaction Data", digits = 2)
| name | type | na | mean | disp | median | mad | min | max | nlevs |
|---|---|---|---|---|---|---|---|---|---|
| timestamp | Date | 0 | NA | NA | NA | NA | 43 | 123.00 | 14 |
| location | character | 0 | NA | 0.86 | NA | NA | 1 | 195.00 | 33 |
| price | numeric | 0 | 204.33 | 719.01 | 22.84 | 16.84 | 3 | 4983.52 | 0 |
| loyaltynum | character | 0 | NA | 0.96 | NA | NA | 3 | 55.00 | 54 |
| day | integer | 0 | 12.03 | 3.95 | 13.00 | 5.93 | 6 | 19.00 | 0 |
| weekday | ordered | 0 | NA | 0.82 | NA | NA | 97 | 245.00 | 7 |
The records containing missing values in car-assignment.csv are shown below. We can see these records are all company trucks not for personal use. As the number of missing values are not large and the missed fileds not quite important, no need to clean or exclude these records.
| LastName | FirstName | CarID | CurrentEmploymentType | CurrentEmploymentTitle |
|---|---|---|---|---|
| Hafon | Albina | NA | Facilities | Truck Driver |
| Hawelon | Benito | NA | Facilities | Truck Driver |
| Hawelon | Claudio | NA | Facilities | Truck Driver |
| Mies | Henk | NA | Facilities | Truck Driver |
| Morlun | Valeria | NA | Facilities | Truck Driver |
| Morlun | Adan | NA | Facilities | Truck Driver |
| Morluniau | Cecilia | NA | Facilities | Truck Driver |
| Nant | Irene | NA | Facilities | Truck Driver |
| Scozzese | Dylan | NA | Facilities | Truck Driver |
First of all, a 2d histogram for credit card transaction frequency with location by hour was built by below code chunk. A slider is added to select range of days as the filtering criterion, and a data table is linked to the graph to show details related to the selections.
d <- highlight_key(cc)
# Plot the 2d histogram for credit card
gra_1 <- plot_ly(data = d, x = ~as.factor(hour), y = ~location,
hovertemplate = paste(
" %{yaxis.title.text}: %{y}<br>",
"%{xaxis.title.text}: %{x}<br>",
"Transaction Count: %{z}",
"<extra></extra>")) %>%
add_histogram2d(colors = "Blues") %>%
layout(title = "<b>Graph.1 Credit Card Transcation Frequency by Hour</b>",
xaxis = list(title = "Time", tickmode = "linear"),
yaxis = list(title="Location", tickmode = "linear")
)
# Add a slider to the graph to select the range of date, and
# link a data table to show details
crosstalk::bscols(
crosstalk::filter_slider("day", "Date of Jan",
d, ~day, step = 1,
animate = T, ticks = F),
gra_1,
DT::datatable(d, filter=c("top"), class = "hover",
options = list(pageLength = 5,
columnDefs = list(
list(visible = FALSE,
targets = c(5, 7)))
)),
widths = 10)
Then the 2d histograms for credit card and loyalty card transaction frequency with location by day were created as below. Added a dropdown list to select the card number as filtering criteria to show the transaction frequency of specific card owner along days.
d1 = highlight_key(cc)
d2 = highlight_key(loyalty)
# Plot the 2d histogram of credit card transaction frequency
gra_2.1 <- plot_ly(data = d1, x = ~as.factor(day), y = ~location,
hovertemplate = paste(
" %{yaxis.title.text}: %{y}<br>",
"%{xaxis.title.text}: %{x}<br>",
"Transaction Count: %{z}",
"<extra></extra>")) %>%
add_histogram2d(colors = "Blues") %>%
layout(title = "<b>Graph.2-1 Credit Card Transaction Frequency by Day</b>",
#annotations = list(text = "Credit Card", showarrow = F, x =10, y=32),
xaxis = list(title = "Date of Jan", tickmode = "linear"),
yaxis = list(title = "Location", tickmode = "linear")
)
# Plot the 2d histogram of loyalty card transaction frequency
gra_2.2 <- plot_ly(data = d2, x = ~as.factor(day), y = ~location,
hovertemplate = paste(
" Location: %{y}<br>",
"Date of Jan: %{x}<br>",
"Transaction Count: %{z}",
"<extra></extra>")) %>%
add_histogram2d(colors = "Greys") %>%
layout(title = "<b>Graph.2-2 Loyalty Card Transaction Frequency by Day</b>",
#annotations = list(text = "Loyalty Card", showarrow = F, x =10, y=32),
xaxis = list(title = "Date of Jan", tickmode = "linear"),
yaxis = list(title = "Location", tickmode = "linear", visible = T)
)
# Add a dropdown list to the graph to filter the card number
gra_2.1_c <- crosstalk::bscols(crosstalk::filter_select(
"ccnum",
"Choose last 4 credit card Number",
d1, ~last4ccnum,
multiple = F),
gra_2.1,
widths = 10)
gra_2.2_c <- crosstalk::bscols(crosstalk::filter_select(
"lonum",
"Choose loyalty card number",
d2, ~loyaltynum,
multiple = F),
gra_2.2,
widths = 10)
gra_2.1_c
gra_2.2_c
Now take the GPS tracking data into account, it’s necessary to draw movement path on the tourist map with the GPS tracking data, so that we can see where the employees have gone and gathered together during the two weeks before the disappearance.
The first thing to do is plotting Raster Layer of the tourist map of Abila, Kronos, as the background map, and import Abila GIS data layer.
bgmap <- raster("data/MC2-tourist.jpg")
bgmap
class : RasterLayer
band : 1 (of 3 bands)
dimensions : 1535, 2740, 4205900 (nrow, ncol, ncell)
resolution : 1, 1 (x, y)
extent : 0, 2740, 0, 1535 (xmin, xmax, ymin, ymax)
crs : NA
source : MC2-tourist.jpg
names : MC2.tourist
values : 0, 255 (min, max)
Abila_st <- st_read(dsn = "data/Geospatial", layer = "Abila")
Reading layer `Abila' from data source
`D:\ReginaDong\DataViz_blog\_posts\2021-07-17-assignvastchallenge\data\Geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS: WGS 84
According to the result of bgmap, the extent of bound is (0, 2740, 0, 1535) for (xmin, xmax, ymin, ymax), while the bounding box of Abila_st is (24.82401, 24.90997, 36.04502, 36.09492) for (xmin, xmax, ymin, ymax). So it’s necessary to reset the coordinate bounding of bgmap according to Abila_st, or the GPS tracks won’t be matched and shown on the background map normally. Below code chunk is for setting the extreme coordinates of bgmap.
xmin(bgmap) = 24.82401
xmax(bgmap) = 24.90997
ymin(bgmap) = 36.04502
ymax(bgmap) = 36.09492
The code chunk below is used to convert GPS spatial data into a Simple Feature (SF) data frame.
gps_sf <- st_as_sf(gps,
coords = c("long", "lat"),
crs = 4326)
Then before combining the background map and the GPS tracking lines to generate the movement path, the spatial data need to be grouped by id, day and hour respectively.
# Group by id and day
gps_path <- gps_sf %>%
group_by(id, day) %>%
summarize(m = mean(Timestamp),
do_union=FALSE) %>%
st_cast("LINESTRING")
np = npts(gps_path, by_feature = T)
gps_path2 <- cbind(gps_path, np) %>%
filter(np > 1) # exclude orphan coordinate records
# Group by id and hour
gps_hour <- gps_sf %>%
group_by(id, hour) %>%
summarise(m = mean(Timestamp),
do_union = FALSE) %>%
st_cast("LINESTRING")
np = npts(gps_hour, by_feature = T)
gps_hour2 <- cbind(gps_hour, np) %>%
filter(np > 1) # exclude orphan coordinate records
Set day as filtering criteria by using filter() function, and differentiate the line colors by id through setting col argument of tm_lines() function, so that the geographical graph could show the movement track of all cars in specific date. Below code chunk is used to create the graph.
# Filter GPS spatial data by date of Jan
gps_path_selected <- gps_path2 %>%
filter(day == "16")
# Plot the moving path
tmap_mode("view")
gra_3.1 <- tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(gps_path_selected) +
tm_lines(col = "id", palette = "Dark2") +
tmap_options(max.categories = 44)
gra_3.1
Graph.3.1 GPS Moving Route in A Specific Day
Set id as filtering criteria by using filter() function, and differentiate the line colors by day through setting col argument of tm_lines() function, so that the geographical graph could show the movement track of a specific car in all days. Below code chunk is used to create the graph.
# Filter GPS spatial data by CarID
gps_path_selected2 <- gps_path2 %>%
filter(id == "1")
# Plot the moving path
tmap_mode("view")
gra_3.2 <- tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(gps_path_selected2) +
tm_lines(col = "day", palette = "Dark2")
gra_3.2
Graph.3.2 GPS Moving Route of A Specific CarID
Set hour as filtering criteria by using filter() function, and differentiate the line colors by id through setting col argument of tm_lines() function, so that the geographical graph could show the movement track of all cars in specific hours of days. Below code chunk is used to create the graph.
# Filter GPS spatial data by hours of time
gps_hour_selected <- gps_hour2 %>%
filter(hour %in% c(23, 0, 1, 2))
# Plot the moving path
tmap_mode("view")
gra_4 <- tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(gps_hour_selected) +
tm_lines(col = "id", palette = "Dark2") +
tmap_options(max.categories = 44)
gra_4
Graph.4 GPS Moving Route of in Specific Hours
In order to compare the money spent in different locations, a line chart showing average price spent in credit card and loyal card in weekdays and weekends were created by below code chunk.
In the code chunk, first calculate the average price grouped by weekday and location using group_by() and summarize() function, then combine them in one table with rbind() function, which is used to plot the line chart using functions of plotly packages. A dropdown list is added to select the location to be filtered.
# Calculate the average price by weekday and location, then combine the results
mean_price <- cc %>%
group_by(weekday, location) %>%
summarize(Avg.Price = mean(price)) %>%
ungroup() %>%
mutate(card = "Credit Card") %>%
rbind(
loyalty %>%
group_by(weekday, location) %>%
summarize(Avg.Price = mean(price)) %>%
ungroup() %>%
mutate(card = "Loyalty Card")
)
# Plot the line chart
d <- highlight_key(mean_price)
gra_5 <- plot_ly(data = d, x = ~weekday, y= ~Avg.Price,
color = ~card, colors = "Paired",
linetype = ~card,
type = 'scatter', mode = 'lines+markers') %>%
layout(title = "<b>Graph.5 Average Transaction Price by Weekday</b>")
# Add a dropdown list to select one location
crosstalk::bscols(crosstalk::filter_select("loc", "Choose a location first",
d, ~location, multiple = F),
gra_5,
widths = 10)
As the line chart above can’t vividly show the comparison of transaction price between locations nor the outliers among them, a box plot is created by below code chunk, where the first step is to combine cc and loyalty dataset as the plotting data source, then using plot_ly() function to generate the graph. A dropdown list is added to select the weekday to be filtered, and a data table is linked to the boxplot to show details related to the selections.
# Combine cc and loyalty data
cards <- cc %>%
select(-hour) %>%
rename(cardnum = last4ccnum) %>%
mutate(card = "Credit Card") %>%
rbind(loyalty %>%
rename(cardnum = loyaltynum) %>%
mutate(card = "Loyalty Card"))
# Generate the box plot
d <- highlight_key(cards)
gra_6 <- plot_ly(data = d,x = ~location, y= ~price,
color = ~card, colors = "Paired",
type = 'box', boxmean = T) %>%
layout(title = "<b>Graph.6 Box Plot of Transaction Price by Location</b>",
boxmode = "group")
# Add a dropdown list to filter the weekday, and
# link a data table to show details
crosstalk::bscols(
crosstalk::filter_select("wdy", "Choose the weekday",
d, ~weekday, multiple = T),
gra_6,
DT::datatable(d, filter=c("top"), class = "hover",
options = list(pageLength = 5,
columnDefs = list(
list(visible = FALSE,
targets = c(5)))
)),
widths = 22)
Using just the credit and loyalty card data, identify the most popular locations, and when they are popular.
As can be seen in Graph.1, Graph2.1, Graph2.2, the darker the color of the cell the more frequent the place has been visited.
If select all the recorded date and card numbers of credi card, the transaction frequency by hour and date is shown as below. It’s obvious that Katerina’s Cafe, Hippokampos, Hallowed Grounds, Guy’s Gyros and Brew’ve Been Served were visited quite lots of times.
Fig.2 The total transaction frequency of credit card by hour and date
Similarly, if see the transaction frequency of loyalty card by date as shown below, Katerina’s Cafe, Hippokampos, Hallowed Grounds, Guy’s Gyros and Brew’ve Been Served still hold the darkest color which means lots of people made purchase at these places.
Fig.3 The total transaction frequency of loyalty card by date
What anomalies do you see? What corrections would you recommend to correct these anomalies?
Abnormal findings spotted from credit and loyalty card data includes:
Kronos Mart: Visit this place at 3 AM, which is quite abnormal cause mart generally doesn’t open at that time.
Fig.4 Transaction record in Kronos Mart at 3 AM
Frydos Autosupply n’ More: Extremely high transaction price recorded on 13rd Jan, Friday, which only have one credit card transaction record without loyalty record. Besides, if see the average transaction price of this place along weekday, as shown in Fig.6, the mean value of price on Mon is abnormally high.
Fig.5 The outlier of transaction price
Fig.6 The average transaction price of Frydos Autosupply n’ More
Daily Dealz: Only one transaction record, happened at 6AM of 13rd Jan, found in this place, and the transaction price is quite low.

Fig.7 The transaction record of Dail Dealz
Hippokampos: Visit this place at 10PM, which is the latest recording time, and don’t know what kind of business is in this place.
Fig.8 The transaction record of Hippokampos
Abila Scrapyard: From the name of this place, it’s a place where scrap is collected before being discarded. But the transaction price here is quite high, which is contrary to the intuition.
Fig.9 The transaction price of credit and loyalty card in Abila Scrapyard
As for the correction of the data, so far it’s recommended to check the time of transaction recorded in Kronos Mart and Hippokampos, the abnormal price happened in Frydos Autosupply and Abila Scrapyard, as well as if there are records missing of Daily Dealz.
Add the vehicle data to your analysis of the credit and loyalty card data in credit and loyalty card. How does your assessment of the anomalies in question 1 change based on this new data?
Lots of people gathered at Katherina’s Cafe during Saturday.
As shown in below movement path graphs Fig.10 and Fig.11, Katherina’s Cafe is quite near to GAS Tech company. That’s abnormal because it makes no sense that people still go to the Cafe near their company during weekend as there are lots of cafe bars in Abila.
Fig.10 The moving path around Katherina’s Cafe on 11th Jan, Sat
Fig.11 The moving path around Katherina’s Cafe on 18th Jan, Sat
And if analyze together with Graph.1 credit card transaction frequency by Hour, as shown in Fig.12 and Fig.13, these people mainly get together at the same time in these two Saturday, which is around 7PM to 8PM in particular. Hense it’s more suspicious to find losts of people gather in a place near GAS Tech at the same time in Saturday.
Fig.12 The transaction frequency of Katherina’s Cafe on 11th Jan, Sat
Fig.13 The transaction frequency of Katherina’s Cafe on 18th Jan, Sat
Transaction in Abila Scrapyard should be normal if analyze together with movement path. As mentioned in section 4.1, Abila Scrapyard transaction price looks quite high. But if see the movement path in all date having transactions, the car had stopped by Abila Scrapyard mainly is CarID 106, which should be the company truck for business use, which prove that there did exist some kind of business between GAS Tech and Abila Scrapyard. Thus the high amounts of transaction look more reasonable.
Fig.14-1 The movement path around Abila Scrapyard on 7th Jan
Fig.14-2 The movement path around Abila Scrapyard on 9th Jan
Fig.14-3 The movement path around Abila Scrapyard on 14th Jan
Fig.14-4 The movement path around Abila Scrapyard on 16th Jan
What discrepancies between vehicle, credit, and loyalty card data do you find?
Fig.15 The moving path between 11PM to 2AM of the next day
Fig.16 The moving path during 6AM of the next day
Can you infer the owners of each credit card and loyalty card? What is your evidence?
Set CarID = 1 and get the moving path. It’s obvious that this person love Hallowed Grounds as he had been to this place in several days. But he never went to Katherina’s Cafe. Besides, Ouzeri Elian, Albert’s Fine Clothing and U-Pump are also places he would like to visit. According to these characters, which is conform to the consumption model of the owner of credit card 9551 and loyalty card L5777.
Fig.17 The moving path of CarID 1
Fig.18 Transaction frequency of credit and loyalty card by day
Set CarID =2 and get the moving path. The favorite place for the car owner are Bean There Done That, General Grocer and places near Guy’s Gyros. Besides, it looks like the car owner stopped by Albert’s Fine Clothing on 16th Jan, as shown in the GPS tracking below. Thus the owner of credit card 7819 and loyalty card L5259 is most likely the owner of CarID 2, as shown in the following 2d histograms.
Fig.19 The moving path of CarID 2
Fig.20 Transaction frequency of credit and loyalty card by day
Set CarID = 3 and and get the moving path as shown below, which looks quite similar to CarID 2. Thus the owner of credit card 1877 and loyalty card L3014, which is the most similar to credit card 6895 and loyalty card L3366, is most like the owner of CarID 3,
Fig.21 The moving path of CarID 3
Fig.22 Transaction frequency of credit and loyalty card by day
Set CarID = 4 and get the movement path as shown below, similar to the above logic, compare the frequently visited spots shown in path and the consumption mode displayed by 2d histograms of credit card, and found the most similar pair of CarID owner and credit card owner, then according to the credit card find the loyalty card with the most similar transaction frequency shown by 2d histogram. Thus the owner of CarID 4 is most likely the owner of credit card 7688 and loyalty card L4164.
Fig.23 The moving path of CarID 4
Fig.24 Transaction frequency of credit and loyalty card by day
Set CarID = 5 and get the movement path as shown below. It looks like the car stopped by Kronos Mart on 14th of Jan. Then according to comparison with following 2d histograms, the owner of CarID 5 is most likely the owner of credit card 6899 and loyalty card L6267.
Fig.25 The moving path of CarID 5
Fig.26 Transaction frequency of credit and loyalty card by day
Set CarID = 6 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 6 is most likely the owner of credit card 4434 and loyalty card L2169.
Fig.27 The moving path of CarID 6
Fig.28 Transaction frequency of credit and loyalty card by day
Set CarID = 7 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 7 is most likely the owner of credit card 2540 and loyalty card L7291.
Fig.29 The moving path of CarID 7
Fig.30 Transaction frequency of credit and loyalty card by day
Set CarID = 8 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 8 is most likely the owner of credit card 3484 and loyalty card L2490.
Fig.31 The moving path of CarID 8
Fig.32 Transaction frequency of credit and loyalty card by day
Set CarID = 9 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 9 is most likely the owner of credit card 1321 and loyalty card L4149.
Fig.33 The moving path of CarID 9
Fig.34 Transaction frequency of credit and loyalty card by day
Set CarID = 10 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 10 is most likely the owner of credit card 8332 and loyalty card L2070.
Fig.35 The moving path of CarID 10
Fig.36 Transaction frequency of credit and loyalty card by day
Set CarID = 11 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 11 is most likely the owner of credit card 1415 and loyalty card L7783.
Fig.37 The moving path of CarID 11
Fig.38 Transaction frequency of credit and loyalty card by day
Set CarID = 12 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 12 is most likely the owner of credit card 7792 and loyalty card L5756.
Fig.39 The moving path of CarID 12
Fig.40 Transaction frequency of credit and loyalty card by day
Set CarID = 13 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 13 is most likely the owner of credit card 5407 and loyalty card L4034.
Fig.41 The moving path of CarID 13
Fig.42 Transaction frequency of credit and loyalty card by day
Set CarID = 14 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 14 is most likely the owner of credit card 9617 and loyalty card L5553.
Fig.43 The moving path of CarID 14
Fig.44 Transaction frequency of credit and loyalty card by day
Set CarID = 15 and get the movement path as shown below. According to comparison with following 2d histograms, especially the car it stopped by Frank’s Fuels on 8th Jan as shown in Fig.45, the owner of CarID 15 is most likely the owner of credit card 3853 and loyalty card L1485.
Fig.45 The moving path of CarID 15
Fig.46 Transaction frequency of credit and loyalty card by day
Set CarID = 16 and get the movement path as shown below. According to comparison with following 2d histograms, especially the car looks like it was parked at Roberts and Sons twice, the owner of CarID 16 is most likely the owner of credit card 7354 and loyalty card L9254.
Fig.47 The moving path of CarID 16
Fig.48 Transaction frequency of credit and loyalty card by day
Set CarID = 17 and get the movement path as shown below. According to comparison with following 2d histograms, especially the car looks like it stopped by Ahaggo Museum on 17th Jan, and Ouzeri Elian on 19th Jan as shown in Fig.49, the owner of CarID 17 is most likely the owner of credit card 7384 and loyalty card L3800.
Fig.49 The moving path of CarID 17
Fig.50 Transaction frequency of credit and loyalty card by day
Set CarID = 18 and get the movement path as shown below. According to comparison with following 2d histograms, especially the car looks like it was parked at General Grocer on 11th Jan, as shown in the figure below, the owner of CarID 18 is most likely the owner of credit card 8129 and loyalty card L8328.
Fig.51 The moving path of CarID 18
Fig.52 Transaction frequency of credit and loyalty card by day
Set CarID = 19 and get the movement path as shown below. According to comparison with following 2d histograms, especially the car looks like it stopped by General Grocer on 19th Jan, as well as Ouzeri Elian around 6th, 7th and 18th of Jan, as shown in the figure below, the owner of CarID 19 is most likely the owner of credit card 6895 and loyalty card L3366.
Fig.53 The moving path of CarID 19
Fig.54 Transaction frequency of credit and loyalty card by day
Set CarID = 20 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 20 is most likely the owner of credit card 5368 and loyalty card L2247.
Fig.55 The moving path of CarID 20
Fig.56 Transaction frequency of credit and loyalty card by day
Set CarID = 21 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 21 is most likely the owner of credit card 1286 and loyalty card L3572.
Fig.57 The moving path of CarID 21
Fig.58 Transaction frequency of credit and loyalty card by day
Set CarID = 22 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 22 is most likely the owner of credit card 4498 and loyalty card L9406.
Fig.59 The moving path of CarID 22
Fig.60 Transaction frequency of credit and loyalty card by day
Set CarID = 23 and get the movement path as shown below. According to comparison with following 2d histograms, especially the car looks like was parked at Coffee Shack on 14th Jan, as shown in Fig.61, the owner of CarID 23 is most likely the owner of credit card 7117 and loyalty card L6417.
Fig.61 The moving path of CarID 23
Fig.62 Transaction frequency of credit and loyalty card by day
Set CarID = 24 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 24 is most likely the owner of credit card 9683 and loyalty card L7291.
Fig.63 The moving path of CarID 24
Fig.64 Transaction frequency of credit and loyalty card by day
Set CarID = 25 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 25 is most likely the owner of credit card 2142 and loyalty card L9637.
Fig.65 The moving path of CarID 25
Fig.66 Transaction frequency of credit and loyalty card by day
Set CarID = 26 and get the movement path as shown below. According to comparison with following 2d histograms, especially the car looks like was parked at Roberts And Sons on 11th Jan as shown in Fig.67, the owner of CarID 26 is most likely the owner of credit card 1310 and loyalty card L8012.
Fig.67 The moving path of CarID 26
Fig.68 Transaction frequency of credit and loyalty card by day
Set CarID = 27 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 27 is most likely the owner of credit card 3492 and loyalty card L7814.
Fig.69 The moving path of CarID 27
Fig.70 Transaction frequency of credit and loyalty card by day
Set CarID = 28 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 28 is most likely the owner of credit card 2463 and loyalty card L6886.
Fig.71 The moving path of CarID 28
Fig.72 Transaction frequency of credit and loyalty card by day
Set CarID = 29 and get the movement path as shown below. According to comparison with following 2d histograms, the owner of CarID 29 is most likely the owner of credit card 2418 and loyalty card L9018.
Fig.73 The moving path of CarID 29
Fig.74 Transaction frequency of credit and loyalty card by day
Set CarID = 30 and get the movement path as shown below. According to comparison with following 2d histograms, especially the car looks like was parked at Ahaggo Museum on 12th Jan as shown in Fig.75, the owner of CarID 30 is most likely the owner of credit card 6901 and loyalty card L9363.
Fig.75 The moving path of CarID 30
Fig.76 Transaction frequency of credit and loyalty card by day
Set CarID = 31 and get the movement path as shown below,and it’s obvious that the car was parked at Desafio Golf Course on 19th Jan, and only have tracking data from 17th to 19th Jan. According to comparison with following 2d histograms, the owner of CarID 31 is most likely the owner of credit card 5010 and loyalty card L2459.
Fig.77 The moving path of CarID 31
Fig.78 Transaction frequency of credit and loyalty card by day
Set CarID = 32 and get the movement path as shown below. It’s obvious that the car was parked at Desafio Golf Course on 19th Jan. According to comparison with following 2d histograms, the owner of CarID 32 is most likely the owner of credit card 8156 and loyalty card L5224.
Fig.79 The moving path of CarID 32
Fig.80 Transaction frequency of credit and loyalty card by day